Model Selection

Low-resource Deployment

# Low-resource Deployment

Deepseek Ai DeepSeek R1 Distill Qwen 14B GGUF

DeepSeek-R1-Distill-Qwen-14B is an optimized large language model with a parameter scale of 14B, released by DeepSeek AI. It is distilled from the Qwen architecture and offers multiple GGUF quantization versions to improve performance.

Large Language Model

featherless-ai-quants

Nvidia AceReason Nemotron 7B GGUF

AceReason-Nemotron-7B is a large language model based on the Nemotron architecture with 7B parameters, offering multiple quantized versions to accommodate different hardware requirements.

Large Language Model

Allura Org Q3 30B A3B Designant GGUF

A Llamacpp imatrix quantized version based on allura-org/Q3-30B-A3B-Designant, suitable for various quantization needs, supporting role-playing and conversational tasks.

Large Language Model

AM Thinking V1 GGUF

AM-Thinking-v1 is a text generation model based on the GGUF format, suitable for various natural language processing tasks.

Large Language Model

Primeintellect INTELLECT 2 GGUF

Quantized version of INTELLECT-2, optimized using llama.cpp, supporting multiple quantization types to accommodate different hardware requirements.

Large Language Model

Andrewzh Absolute Zero Reasoner Coder 7b GGUF

Llamacpp quantized version based on andrewzh's Absolute_Zero_Reasoner-Coder-7b model, supporting multiple quantization levels, suitable for reasoning and code generation tasks.

Large Language Model

Nvidia OpenCodeReasoning Nemotron 14B GGUF

This is the Llamacpp imatrix quantized version of the NVIDIA OpenCodeReasoning-Nemotron-14B model, suitable for code reasoning tasks.

Large Language Model Supports Multiple Languages

Parakeet Tdt 0.6b V2 Onnx

NVIDIA Parakeet TDT 0.6B V2 is a model based on automatic speech recognition (ASR) tasks, suitable for English speech-to-text tasks.

Speech Recognition English

Goekdeniz Guelmez Josiefied Qwen3 8B Abliterated V1 GGUF

This is a quantized version of the Qwen3-8B model, using llama.cpp for iMatrix quantization, suitable for chat scenarios.

Large Language Model

Medra is a medical domain-specific QA and summarization model supporting English and Romanian, designed for medical AI applications.

Large Language Model Supports Multiple Languages

Allura Org Remnant Glm4 32b GGUF

Remnant-GLM4-32B is a 32B-parameter large language model based on the GLM4 architecture, supporting role-playing and conversational interactions, particularly suitable for salamander-related applications.

Large Language Model

Mlabonne Qwen3 8B Abliterated GGUF

This is the quantized version of the Qwen3-8B-abliterated model, quantized using llama.cpp, suitable for text generation tasks.

Large Language Model

Deepthink 1.5B Open PRM Q8 0 GGUF

Deepthink-1.5B-Open-PRM is a 1.5B parameter open-source language model, converted to GGUF format for use with llama.cpp.

Large Language Model English

Mistral Community Pixtral 12b GGUF

This is the quantized version of the pixtral-12b model, quantized using llama.cpp, supporting image-text-to-text tasks.

Gemma 2 9b It Abliterated GGUF

A quantized version based on Gemma 2.9B, optimized using llama.cpp, suitable for running in LM Studio.

Large Language Model English

Qwen2.5 1.5B Instruct GGUF

Qwen2.5 is the latest series of the Qwen large language model, featuring a 1.5B parameter instruction-tuned model that supports multilingual and long text generation.

Large Language Model English

Llama OuteTTS 1.0 1B Bf16

This is a text-to-speech model based on the MLX format, supporting multiple languages and suitable for speech synthesis tasks.

Speech Synthesis Supports Multiple Languages

Deepcoder 1.5B Preview AWQ

DeepCoder-1.5B-Preview is a large language model for code reasoning, fine-tuned from DeepSeek-R1-Distilled-Qwen-1.5B through distributed reinforcement learning, capable of handling longer context lengths.

Large Language Model

Transformers English

Slim Orpheus 3b JAPANESE Ft Q8 0 GGUF

This is a GGUF format model converted from the slim-orpheus-3b-JAPANESE-ft model, specifically optimized for Japanese text processing.

Large Language Model Japanese

Phi 4 Reasoning

Phi-4 Reasoning is a cutting-edge open-weight reasoning model based on Phi-4, fine-tuned with supervised chain-of-thought trajectory datasets and trained via reinforcement learning, specializing in mathematics, science, and programming skills.

Large Language Model

Transformers Supports Multiple Languages

MiniMaid-L2 is a role-play specialized model further optimized from MiniMaid-L1, achieving outstanding performance among 3B-scale models through knowledge distillation and training on a larger dataset.

Large Language Model

Transformers English

Open Thoughts OpenThinker2 7B GGUF

Quantized version of OpenThinker2-7B, using llama.cpp for quantization, suitable for text generation tasks.

Large Language Model

Orpheus 3b 0.1 Ft Q2 K.gguf

This model is a GGUF format conversion of canopylabs/orpheus-3b-0.1-ft, suitable for text generation tasks.

Large Language Model English

Orpheus 3b 0.1 Ft Q4 K M GGUF

This model is a GGUF-format conversion of canopylabs/orpheus-3b-0.1-ft, suitable for text generation tasks.

Large Language Model English

Gemma 2 2b It Tool Think

Text generation model fine-tuned based on google/gemma-2b-it, supporting tool call reasoning process

Large Language Model

Gemma 3 12b It Int4 Gguf

Gemma 3 is a lightweight multimodal open model from Google that supports text and image inputs with text outputs, featuring a 128K large context window and support for 140+ languages.

Orpheus Bangla GGUF

This is the static quantized version of the asif00/orpheus-bangla-tts model, supporting Bengali text-to-speech tasks.

Speech Synthesis Other

Orpheus 3b 0.1 Ft Q6 K GGUF

This is a GGUF format model converted from canopylabs/orpheus-3b-0.1-ft, suitable for text-to-speech tasks.

Large Language Model English

Orpheus 3b 0.1 Ft Q2 K GGUF

This is a GGUF format model converted from the canopylabs/orpheus-3b-0.1-ft model, suitable for text generation tasks.

Large Language Model English

Qwen Encoder 0.5B GGUF

This is a statically quantized version of the knowledgator/Qwen-encoder-0.5B model, primarily designed for text encoding tasks.

Large Language Model English

Llama 3.1 Nemotron Nano 8B V1 GGUF

An 8B-parameter open-source large language model released by NVIDIA, based on the Llama-3 architecture, offering multiple quantization versions

Large Language Model English

Gemma 2 2b Jpn It Translate GGUF

A statically quantized model based on webbigdata/gemma-2-2b-jpn-it-translate, supporting translation tasks between Japanese and English.

Machine Translation Supports Multiple Languages

Beaverai MN 2407 DSK QwQify V0.1 12B GGUF

A large language model based on 12B parameters, supporting text generation tasks, released under the Apache-2.0 license.

Large Language Model

Lightblue Reranker 0.5 Bin Filt Gguf

This is a text ranking model used for reordering and scoring texts to improve the relevance of search results.

Gemma 3 12b It GGUF

Gemma 3 12B is a large language model that provides a quantized version in GGUF format, suitable for local deployment and use.

Large Language Model

Jbaron34 Qwen2.5 0.5b Bebop Reranker Newer Small Gguf

A 50-million-parameter text reranking model based on the Qwen2.5 architecture, suitable for information retrieval and document ranking tasks

Large Language Model

Gemma 3 4b Pt Qat Q4 0 Gguf

Gemma 3 is a lightweight open model series launched by Google, built on the same technology as Gemini, supporting multimodal input and text output.

Open R1 OlympicCoder 7B GGUF

OlympicCoder-7B is a 7B-parameter large language model focused on code generation, based on open-r1/OlympicCoder-7B with llama.cpp quantization, supporting multiple quantization level options.

Large Language Model English

Financeconnect 13B I1 GGUF

FinanceConnect-13B is a 13B-parameter large language model specialized in the financial domain, supporting natural language processing tasks such as summarization, classification, and translation.

Large Language Model English

Whisper Large V3.w4a16

This is the quantized version of openai/whisper-large-v3, employing INT4 weight quantization and FP16 activation quantization, suitable for vLLM inference.

Speech Recognition

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase